Based on assignments by Lisa Zhang and Jimmy Ba.
In this lab, you will build models to perform image colourization. That is, given a greyscale image, we wish to predict the colour at each pixel. Image colourization is a difficult problem for many reasons, one of which being that it is ill-posed: for a single greyscale image, there can be multiple, equally valid colourings.
To keep the training time manageable we will use the CIFAR-10 data set, which consists of images of size 32x32 pixels. For most of the questions we will use a subset of the dataset. The data loading script is included with the notebooks, and should download automatically the first time it is loaded.
We will be starting with a convolutional autoencoder and tweaking it along the way to improve our perforamnce. Then as a second part of the assignment we will compare the autoencoder approach to conditional generative adversarial networks (cGANs).
In the process, you are expected to learn to:
Submit an HTML file containing all your code, outputs, and write-up from parts A and B. You can produce a HTML file directly from Google Colab. The Colab instructions are provided at the end of this document.
Do not submit any other files produced by your code.
Include a link to your colab file in your submission.
Please use Google Colab to complete this assignment. If you want to use Jupyter Notebook, please complete the assignment and upload your Jupyter Notebook file to Google Colab for submission.
Include a link to your Colab file here. If you would like the TA to look at your Colab file in case your solutions are cut off, please make sure that your Colab file is publicly accessible at the time of submission.
Colab Link: https://colab.research.google.com/drive/1IKLwzHC6RkKBDbSwXuFuOeWRyz8QKKgm?usp=sharing
In this part we will construct and compare different autoencoder models for the image colourization task.
Provided are some helper functions for loading and preparing the data. Note that you will need to use the Colab GPU for this assignment.
"""
Colourization of CIFAR-10 Horses via classification.
"""
import argparse
import math
import time
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import numpy.random as npr
import scipy.misc
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
######################################################################
# Setup working directory
######################################################################
%mkdir -p /content/a3/
%cd /content/a3
######################################################################
# Helper functions for loading data
######################################################################
# adapted from
# https://github.com/fchollet/keras/blob/master/keras/datasets/cifar10.py
import os
import pickle
import sys
import tarfile
import numpy as np
from PIL import Image
from six.moves.urllib.request import urlretrieve
def get_file(fname, origin, untar=False, extract=False, archive_format="auto", cache_dir="data"):
datadir = os.path.join(cache_dir)
if not os.path.exists(datadir):
os.makedirs(datadir)
if untar:
untar_fpath = os.path.join(datadir, fname)
fpath = untar_fpath + ".tar.gz"
else:
fpath = os.path.join(datadir, fname)
print("File path: %s" % fpath)
if not os.path.exists(fpath):
print("Downloading data from", origin)
error_msg = "URL fetch failure on {}: {} -- {}"
try:
try:
urlretrieve(origin, fpath)
except URLError as e:
raise Exception(error_msg.format(origin, e.errno, e.reason))
except HTTPError as e:
raise Exception(error_msg.format(origin, e.code, e.msg))
except (Exception, KeyboardInterrupt) as e:
if os.path.exists(fpath):
os.remove(fpath)
raise
if untar:
if not os.path.exists(untar_fpath):
print("Extracting file.")
with tarfile.open(fpath) as archive:
archive.extractall(datadir)
return untar_fpath
if extract:
_extract_archive(fpath, datadir, archive_format)
return fpath
def load_batch(fpath, label_key="labels"):
"""Internal utility for parsing CIFAR data.
# Arguments
fpath: path the file to parse.
label_key: key for label data in the retrieve
dictionary.
# Returns
A tuple `(data, labels)`.
"""
f = open(fpath, "rb")
if sys.version_info < (3,):
d = pickle.load(f)
else:
d = pickle.load(f, encoding="bytes")
# decode utf8
d_decoded = {}
for k, v in d.items():
d_decoded[k.decode("utf8")] = v
d = d_decoded
f.close()
data = d["data"]
labels = d[label_key]
data = data.reshape(data.shape[0], 3, 32, 32)
return data, labels
def load_cifar10(transpose=False):
"""Loads CIFAR10 dataset.
# Returns
Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`.
"""
dirname = "cifar-10-batches-py"
origin = "http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
path = get_file(dirname, origin=origin, untar=True)
num_train_samples = 50000
x_train = np.zeros((num_train_samples, 3, 32, 32), dtype="uint8")
y_train = np.zeros((num_train_samples,), dtype="uint8")
for i in range(1, 6):
fpath = os.path.join(path, "data_batch_" + str(i))
data, labels = load_batch(fpath)
x_train[(i - 1) * 10000 : i * 10000, :, :, :] = data
y_train[(i - 1) * 10000 : i * 10000] = labels
fpath = os.path.join(path, "test_batch")
x_test, y_test = load_batch(fpath)
y_train = np.reshape(y_train, (len(y_train), 1))
y_test = np.reshape(y_test, (len(y_test), 1))
if transpose:
x_train = x_train.transpose(0, 2, 3, 1)
x_test = x_test.transpose(0, 2, 3, 1)
return (x_train, y_train), (x_test, y_test)
# Download CIFAR dataset
m = load_cifar10()
# code to examine the dataset
print('Number of samples in the x_train data: ', len(m[0][0]))
print('Number of samples in the y_train data: ',len(m[0][1]))
print('Number of samples in the x_test data: ',len(m[1][0]))
print('Number of samples in the y_test data: ',len(m[1][1]))
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
new_shape=np.transpose(m[0][0][idx], (1, 2, 0))
plt.imshow(np.transpose(m[0][0][idx], (1, 2, 0)))
# print(new_shape.shape)
ax.set_title('Sample x training')
print(new_shape.shape)
Preprocess the data to select only images of horses. Learning to generate only hourse images will make our task easier. Your function will also convert the colour images to greyscale to create our input data.
# select a single category.
HORSE_CATEGORY = 7
# convert colour images into greyscale
def process(xs, ys, max_pixel=256.0, downsize_input=False):
"""
Pre-process CIFAR10 images by taking only the horse category,
shuffling, and have colour values be bound between 0 and 1
Args:
xs: the colour RGB pixel values
ys: the category labels
max_pixel: maximum pixel value in the original data
Returns:
xs: value normalized and shuffled colour images
grey: greyscale images, also normalized so values are between 0 and 1
"""
xs = xs / max_pixel
xs = xs[np.where(ys == HORSE_CATEGORY)[0], :, :, :]
npr.shuffle(xs)
grey = np.mean(xs, axis=1, keepdims=True)
if downsize_input:
downsize_module = nn.Sequential(
nn.AvgPool2d(2),
nn.AvgPool2d(2),
nn.Upsample(scale_factor=2),
nn.Upsample(scale_factor=2),
)
xs_downsized = downsize_module.forward(torch.from_numpy(xs).float())
xs_downsized = xs_downsized.data.numpy()
return (xs, xs_downsized)
else:
return (xs, grey)
Create a dataloader (or function) to batch the samples.
# dataloader for batching samples
def get_batch(x, y, batch_size):
"""
Generated that yields batches of data
Args:
x: input values
y: output values
batch_size: size of each batch
Yields:
batch_x: a batch of inputs of size at most batch_size
batch_y: a batch of outputs of size at most batch_size
"""
N = np.shape(x)[0]
assert N == np.shape(y)[0]
for i in range(0, N, batch_size):
batch_x = x[i : i + batch_size, :, :, :]
batch_y = y[i : i + batch_size, :, :, :]
yield (batch_x, batch_y)
Verify and visualize that we are able to generate different batches of data.
# code to load different batches of horse dataset
print("Loading data...")
(x_train, y_train), (x_test, y_test) = load_cifar10()
print("Transforming data...")
train_rgb, train_grey = process(x_train, y_train)
test_rgb, test_grey = process(x_test, y_test)
# shape of data and labels before selection
print(x_train.shape, y_train.shape)
# shape of training data
print('Training Data: ', train_rgb.shape, train_grey.shape)
# shape of testing data
print('Testing Data: ', test_rgb.shape, test_grey.shape)
Load Batches
# obtain batches of images
xs, ys = next(iter(get_batch(train_grey, train_rgb, 10)))
print(xs.shape, ys.shape)
Visualization
# visualize 5 train/test images
xtrains, ytrains = next(iter(get_batch(train_grey, train_rgb, 5)))
xtests, ytests = next(iter(get_batch(test_grey, test_rgb, 5)))
fig = plt.figure(figsize=(20, 4))
for idx in np.arange(5):
ax = fig.add_subplot(1, 5, idx+1, xticks=[], yticks=[])
new_shape=np.transpose(xtrains[idx], (1, 2, 0))
new_shape = np.squeeze(new_shape)
plt.imshow(new_shape)
# plt.imshow(np.transpose(m[0][0][idx], (1, 2, 0)))
ax.set_title('Sample x training')
fig = plt.figure(figsize=(20, 4))
for idx in np.arange(5):
ax = fig.add_subplot(1, 5, idx+1, xticks=[], yticks=[])
new_shape=np.transpose(xtests[idx], (1, 2, 0))
new_shape = np.squeeze(new_shape)
plt.imshow(new_shape)
# plt.imshow(np.transpose(m[0][0][idx], (1, 2, 0)))
ax.set_title('Sample x testing')
There are many ways to frame the problem of image colourization as a machine learning problem. One naive approach is to frame it as a regression problem, where we build a model to predict the RGB intensities at each pixel given the greyscale input. In this case, the outputs are continuous, and so squared error can be used to train the model.
In this section, you will get familar with training neural networks using cloud GPUs. Run the helper code and answer the questions that follow.
Regression Architecture
class RegressionCNN(nn.Module):
def __init__(self, kernel, num_filters):
# first call parent's initialization function
super().__init__()
padding = kernel // 2
self.downconv1 = nn.Sequential(
nn.Conv2d(1, num_filters, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters),
nn.ReLU(),
nn.MaxPool2d(2),)
self.downconv2 = nn.Sequential(
nn.Conv2d(num_filters, num_filters*2, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters*2),
nn.ReLU(),
nn.MaxPool2d(2),)
self.rfconv = nn.Sequential(
nn.Conv2d(num_filters*2, num_filters*2, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters*2),
nn.ReLU())
self.upconv1 = nn.Sequential(
nn.Conv2d(num_filters*2, num_filters, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters),
nn.ReLU(),
nn.Upsample(scale_factor=2),)
self.upconv2 = nn.Sequential(
nn.Conv2d(num_filters, 3, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(3),
nn.ReLU(),
nn.Upsample(scale_factor=2),)
self.finalconv = nn.Conv2d(3, 3, kernel_size=kernel, padding=padding)
def forward(self, x):
out = self.downconv1(x)
out = self.downconv2(out)
out = self.rfconv(out)
out = self.upconv1(out)
out = self.upconv2(out)
out = self.finalconv(out)
return out
Training code
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self
def get_torch_vars(xs, ys, gpu=False):
"""
Helper function to convert numpy arrays to pytorch tensors.
If GPU is used, move the tensors to GPU.
Args:
xs (float numpy tenosor): greyscale input
ys (int numpy tenosor): rgb as labels
gpu (bool): whether to move pytorch tensor to GPU
Returns:
Variable(xs), Variable(ys)
"""
xs = torch.from_numpy(xs).float()
ys = torch.from_numpy(ys).float()
if gpu:
xs = xs.cuda()
ys = ys.cuda()
return Variable(xs), Variable(ys)
def train(args, gen=None):
# Numpy random seed
npr.seed(args.seed)
# Save directory
save_dir = "outputs/" + args.experiment_name
# LOAD THE MODEL
if gen is None:
Net = globals()[args.model]
gen = Net(args.kernel, args.num_filters)
# LOSS FUNCTION
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(gen.parameters(), lr=args.learn_rate)
# DATA
print("Loading data...")
(x_train, y_train), (x_test, y_test) = load_cifar10()
print("Transforming data...")
train_rgb, train_grey = process(x_train, y_train, downsize_input=args.downsize_input)
test_rgb, test_grey = process(x_test, y_test, downsize_input=args.downsize_input)
# Create the outputs folder if not created already
if not os.path.exists(save_dir):
os.makedirs(save_dir)
print("Beginning training ...")
if args.gpu:
gen.cuda()
start = time.time()
train_losses = []
valid_losses = []
valid_accs = []
for epoch in range(args.epochs):
# Train the Model
gen.train() # Change model to 'train' mode
losses = []
for i, (xs, ys) in enumerate(get_batch(train_grey, train_rgb, args.batch_size)):
images, labels = get_torch_vars(xs, ys, args.gpu)
# Forward + Backward + Optimize
optimizer.zero_grad()
outputs = gen(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
losses.append(loss.data.item())
print(epoch, loss.cpu().detach())
if args.plot:
visual(images, labels, outputs, args.gpu, 1)
return gen
Training visualization code
# visualize 5 train/test images
def visual(img_grey, img_real, img_fake, gpu = 0, flag_torch = 0):
if gpu:
img_grey = img_grey.cpu().detach()
img_real = img_real.cpu().detach()
img_fake = img_fake.cpu().detach()
if flag_torch:
img_grey = img_grey.numpy()
img_real = img_real.numpy()
img_fake = img_fake.numpy()
if flag_torch == 2:
img_real = np.transpose(img_real[:, :, :, :, :], [0, 4, 2, 3, 1]).squeeze()
img_fake = np.transpose(img_fake[:, :, :, :, :], [0, 4, 2, 3, 1]).squeeze()
#correct image structure
img_grey = np.transpose(img_grey[:5, :, :, :], [0, 2, 3, 1]).squeeze()
img_real = np.transpose(img_real[:5, :, :, :], [0, 2, 3, 1])
img_fake = np.transpose(img_fake[:5, :, :, :], [0, 2, 3, 1])
for i in range(5):
ax = plt.subplot(3, 5, i + 1)
ax.imshow(img_grey[i], cmap='gray')
ax.axis("off")
ax = plt.subplot(3, 5, i + 1 + 5)
ax.imshow(img_real[i])
ax.axis("off")
ax = plt.subplot(3, 5, i + 1 + 10)
ax.imshow(img_fake[i])
ax.axis("off")
plt.show()
Main training loop for regression CNN
#Main training loop for CNN
args = AttrDict()
args_dict = {
"gpu": True,
"valid": False,
"checkpoint": "",
"colours": "./data/colours/colour_kmeans24_cat7.npy",
"model": "RegressionCNN",
"kernel": 3,
"num_filters": 32,
'learn_rate':0.001,
"batch_size": 100,
"epochs": 25,
"seed": 0,
"plot": True,
"experiment_name": "colourization_cnn",
"visualize": False,
"downsize_input": False,
}
args.update(args_dict)
cnn = train(args)
Describe the model RegressionCNN. How many convolution layers does it have? What are the filter sizes and number of filters at each layer? Construct a table or draw a diagram.
Answer:
Run the regression training code (should run without errors). This will generate some images. How many epochs are we training the CNN model in the given setting?
Answer: 25 epochs
Re-train a couple of new models using a different number of training epochs. You may train each new models in a new code cell by copying and modifying the code from the last notebook cell. Comment on how the results (output images, training loss) change as we increase or decrease the number of epochs.
#Main training loop for CNN
args = AttrDict()
args_dict = {
"gpu": True,
"valid": False,
"checkpoint": "",
"colours": "./data/colours/colour_kmeans24_cat7.npy",
"model": "RegressionCNN",
"kernel": 3,
"num_filters": 32,
'learn_rate':0.001,
"batch_size": 100,
"epochs": 10,
"seed": 0,
"plot": True,
"experiment_name": "colourization_cnn",
"visualize": False,
"downsize_input": False,
}
args.update(args_dict)
cnn = train(args)
#Main training loop for CNN
args = AttrDict()
args_dict = {
"gpu": True,
"valid": False,
"checkpoint": "",
"colours": "./data/colours/colour_kmeans24_cat7.npy",
"model": "RegressionCNN",
"kernel": 3,
"num_filters": 32,
'learn_rate':0.001,
"batch_size": 100,
"epochs": 50,
"seed": 0,
"plot": True,
"experiment_name": "colourization_cnn",
"visualize": False,
"downsize_input": False,
}
args.update(args_dict)
cnn = train(args)
Answer: With a smaller number of epochs (10), the training loss is larger and the images are more blury, and vice versa with the larger number of epochs (50). I believe that even 50 epochs are not quite enough to reach the desired output of colorizing the images.
A skip connection in a neural network is a connection which skips one or more layer and connects to a later layer. We will introduce skip connections.
Add a skip connection from the first layer to the last, second layer to the second last, etc. That is, the final convolution should have both the output of the previous layer and the initial greyscale input as input. This type of skip-connection is introduced by [3], and is called a "UNet". Following the CNN class that you have completed, complete the init and forward methods of the UNet class. Hint: You will need to use the function torch.cat.
from numpy.core.fromnumeric import shape
#complete the code
class UNet(nn.Module):
def __init__(self, kernel, num_filters, num_colours=3, num_in_channels=1):
super().__init__()
# Useful parameters
stride = 2
padding = kernel // 2
output_padding = 1
############### YOUR CODE GOES HERE ###############
###################################################
self.downconv1 = nn.Sequential(
nn.Conv2d(num_in_channels, num_filters, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters),
nn.ReLU(),
nn.MaxPool2d(2),)
self.downconv2 = nn.Sequential(
nn.Conv2d(num_filters, num_filters*2, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters*2),
nn.ReLU(),
nn.MaxPool2d(2),)
self.rfconv = nn.Sequential(
nn.Conv2d(num_filters*2, num_filters*2, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters*2),
nn.ReLU())
self.upconv1 = nn.Sequential(
nn.Conv2d(num_filters*2, num_filters, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters),
nn.ReLU(),)
# nn.Upsample(scale_factor=2),)
self.upsample = nn.Sequential(
nn.Upsample(scale_factor=2),)
self.upconv2 = nn.Sequential(
nn.Conv2d(96, num_colours-1, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_colours-1),
nn.ReLU(),)
# nn.Upsample(scale_factor=2),)
self.finalconv = nn.Conv2d(34, num_colours, kernel_size=kernel, padding=padding)
def forward(self, x):
############### YOUR CODE GOES HERE ###############
###################################################
enc1 = self.downconv1(x)
enc2 = self.downconv2(enc1)
bn = self.rfconv(enc2)
dec2 = self.upconv1(bn)
dec2 = torch.cat((dec2,enc2), dim=1)
dec2 = self.upsample(dec2)
dec1 = self.upconv2(dec2)
dec1 = torch.cat((dec1,enc1), dim=1)
# print('dec1-1 dimension: ', dec1.shape)
# print('dec1-2 dimension: ', dec1.shape)
dec1 = self.upsample(dec1)
# dec1 = torch.cat((dec1, x), dim=1)
# print('dec1-3 dimension: ', dec1.shape)
# print('dec1-4 dimension: ', dec1.shape)
# print(dec1.shape)
out = self.finalconv(dec1)
return out
Train the "UNet" model for the same amount of epochs as the previous CNN and plot the training curve using a batch size of 100. How does the result compare to the previous model? Did skip connections improve the validation loss and accuracy? Did the skip connections improve the output qualitatively? How? Give at least two reasons why skip connections might improve the performance of our CNN models.
# Main training loop for UNet
args = AttrDict()
args_dict = {
"gpu": True,
"valid": False,
"checkpoint": "",
"colours": "./data/colours/colour_kmeans24_cat7.npy",
"model": "UNet",
"kernel": 3,
"num_filters": 32,
'learn_rate':0.001,
"batch_size": 100,
"epochs": 25,
"seed": 0,
"plot": True,
"experiment_name": "colourization_cnn",
"visualize": False,
"downsize_input": False,
}
args.update(args_dict)
cnn = train(args)
Re-train a few more "UNet" models using different mini batch sizes with a fixed number of epochs. Describe the effect of batch sizes on the training/validation loss, and the final image output.
It seems like the loss decreases as the batch sizes decreases as well
# complete the code
# Main training loop for UNet
args = AttrDict()
args_dict = {
"gpu": True,
"valid": False,
"checkpoint": "",
"colours": "./data/colours/colour_kmeans24_cat7.npy",
"model": "UNet",
"kernel": 3,
"num_filters": 32,
'learn_rate':0.001,
"batch_size": 25,
"epochs": 25,
"seed": 0,
"plot": True,
"experiment_name": "colourization_cnn",
"visualize": False,
"downsize_input": False,
}
args.update(args_dict)
cnn = train(args)
# Main training loop for UNet
args = AttrDict()
args_dict = {
"gpu": True,
"valid": False,
"checkpoint": "",
"colours": "./data/colours/colour_kmeans24_cat7.npy",
"model": "UNet",
"kernel": 3,
"num_filters": 32,
'learn_rate':0.001,
"batch_size": 250,
"epochs": 25,
"seed": 0,
"plot": True,
"experiment_name": "colourization_cnn",
"visualize": False,
"downsize_input": False,
}
args.update(args_dict)
cnn = train(args)
unet = UNet(3, 32, 3, 1)
# generator = Generator(kernel=3, num_filters=32, num_colours=3, num_in_channels=1).cuda()
img_greyscale = torch.rand(100, 1, 32, 32)
img_fake = unet(img_greyscale)
print(img_fake.shape)
In this second half of the assignment we will construct a conditional generative adversarial network for our image colourization task.
To start we will be modifying the previous sample code to construct and train a conditional GAN. We will exploring the different architectures to identify and select our best image colourization model.
Note: This second half of the assignment should be started after the lecture on generative adversarial networks (GANs).
Modify the provided training code to implement a generator. Then test to verify it works on the desired input (Hint: you can reuse some of your earlier autoencoder models here to act as a generator)
from numpy.core.fromnumeric import shape
#complete the code
class Generator(nn.Module):
def __init__(self, kernel=3, num_filters=32, num_colours=3, num_in_channels=1):
super().__init__()
# Useful parameters
stride = 2
padding = kernel // 2
output_padding = 1
############### YOUR CODE GOES HERE ###############
###################################################
self.downconv1 = nn.Sequential(
nn.Conv2d(num_in_channels, num_filters, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters),
nn.ReLU(),
nn.MaxPool2d(2),)
self.downconv2 = nn.Sequential(
nn.Conv2d(num_filters, num_filters*2, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters*2),
nn.ReLU(),
nn.MaxPool2d(2),)
self.rfconv = nn.Sequential(
nn.Conv2d(num_filters*2, num_filters*2, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters*2),
nn.ReLU())
self.upconv1 = nn.Sequential(
nn.Conv2d(num_filters*2, num_filters, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters),
nn.ReLU(),)
# nn.Upsample(scale_factor=2),)
self.upsample = nn.Sequential(
nn.Upsample(scale_factor=2),)
self.upconv2 = nn.Sequential(
nn.Conv2d(96, num_colours, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_colours),
nn.ReLU(),)
# nn.Upsample(scale_factor=2),))
self.finalconv = nn.Conv2d(35, num_colours, kernel_size=kernel, padding=padding)
def forward(self, x):
############### YOUR CODE GOES HERE ###############
###################################################
enc1 = self.downconv1(x)
enc2 = self.downconv2(enc1)
bn = self.rfconv(enc2)
dec2 = self.upconv1(bn)
dec2 = torch.cat((dec2,enc2), dim=1)
# print(dec2.shape)
dec2 = self.upsample(dec2)
# print('Dec2 shape post upsampling - should be __, 16, 16', dec2.shape)
dec1 = self.upconv2(dec2)
# print(dec1.shape)
dec1 = torch.cat((dec1,enc1), dim=1)
dec1 = self.upsample(dec1)
# print(dec1.shape)
out = self.finalconv(dec1)
return out
#test generator architecture
from torchvision.utils import make_grid
generator = Generator(kernel=3, num_filters=32, num_colours=3, num_in_channels=1).cuda()
img_greyscale = torch.rand(100, 1, 32, 32).cuda()
img_fake = generator(img_greyscale)
print(img_fake.shape)
Modify the provided training code to implement a discriminator. Then test to verify it works on the desired input.
# discriminator code
class Discriminator(nn.Module):
def __init__(self, kernel=3, num_filters=32, num_colours=3, num_in_channels=4):
super().__init__()
# Useful parameters
stride = 2
padding = kernel // 2
output_padding = 1
############### YOUR CODE GOES HERE ###############
###################################################
self.downconv1 = nn.Sequential(
nn.Conv2d(num_in_channels, num_filters, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters),
nn.ReLU(),
nn.MaxPool2d(2),)
self.downconv2 = nn.Sequential(
nn.Conv2d(num_filters, num_filters*2, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters*2),
nn.ReLU(),
nn.MaxPool2d(2),)
self.rfconv = nn.Sequential(
nn.Conv2d(num_filters*2, num_filters*2, kernel_size=kernel, padding=padding),
nn.BatchNorm2d(num_filters*2),
nn.ReLU())
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(4096, 224)
self.fc2 = nn.Linear(224, 1)
self.dropout = nn.Dropout(0.3)
self.sig = nn.Sigmoid()
def forward(self, x, img_greyscale):
############### YOUR CODE GOES HERE ###############
###################################################
img = img_greyscale
# print('x, img shapes: ', x.shape, img.shape)
input = torch.cat([x, img], dim=1)
out = self.downconv1(input)
out = self.downconv2(out)
out = self.rfconv(out)
out = self.flatten(out)
out = F.leaky_relu(self.fc1(out))
out = self.dropout(out)
out = self.fc2(out)
# out = torch.tanh(out)
out = self.sig(out)
return out.squeeze()
# test discriminator architecture
discriminator = Discriminator(kernel=3, num_filters=32, num_colours=1, num_in_channels=4).cuda()
img_greyscale = torch.rand(100, 1, 32, 32).cuda()
image_fake = generator(img_greyscale)
img_label_fake = discriminator(x=image_fake, img_greyscale=img_greyscale)
# img_label_fake = discriminator(img_greyscale, image_fake)
print(img_label_fake.shape)
Modify the provided training code to implement a conditional GAN.
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self
def get_torch_vars(xs, ys, gpu=False):
"""
Helper function to convert numpy arrays to pytorch tensors.
If GPU is used, move the tensors to GPU.
Args:
xs (float numpy tenosor): greyscale input
ys (int numpy tenosor): categorical labels
gpu (bool): whether to move pytorch tensor to GPU
Returns:
Variable(xs), Variable(ys)
"""
xs = torch.from_numpy(xs).float()
ys = torch.from_numpy(ys).float() #--> ADDED for cGAN
if gpu:
xs = xs.cuda()
ys = ys.cuda()
return Variable(xs), Variable(ys)
def train(args, gen=None, dis=None):
# Set the maximum number of threads to prevent crash in Teaching Labs
# TODO: necessary?
torch.set_num_threads(5)
# Numpy random seed
npr.seed(args.seed)
# Save directory
save_dir = "outputs/" + args.experiment_name
# LOAD THE COLOURS CATEGORIES
# INPUT CHANNEL
num_in_channels = 1 if not args.downsize_input else 3
# LOAD THE MODEL
if gen is None:
Net = globals()[args.model_gen]
gen = Net(args.kernel, args.num_filters,args.num_colours_g, args.num_in_channels_g)
if dis is None:
Net = globals()[args.model_dis]
dis = Net(args.kernel, args.num_filters,args.num_colours_d, args.num_in_channels_d)
# LOSS FUNCTION
criterion_d = nn.BCELoss()
criterion_g = nn.BCELoss()
g_optimizer = torch.optim.Adam(gen.parameters(), lr=args.learn_rate)
d_optimizer = torch.optim.Adam(dis.parameters(), lr=args.learn_rate/10)
# DATA
print("Loading data...")
(x_train, y_train), (x_test, y_test) = load_cifar10()
print("Transforming data...")
train_rgb, train_grey = process(x_train, y_train, downsize_input=args.downsize_input)
test_rgb, test_grey = process(x_test, y_test, downsize_input=args.downsize_input)
# Create the outputs folder if not created already
if not os.path.exists(save_dir):
os.makedirs(save_dir)
print("Beginning training ...")
if args.gpu:
gen.cuda()
dis.cuda()
start = time.time()
train_losses = []
valid_losses = []
valid_accs = []
for epoch in range(args.epochs):
# Train the Model
gen.train()
dis.train()
losses = []
for i, (xs, ys) in enumerate(get_batch(train_grey, train_rgb, args.batch_size)):
images, labels = get_torch_vars(xs, ys, args.gpu)
if args.gpu==True:
img_grey = images.cuda()
img_real = labels.cuda()
batch_size = args.batch_size
real_labels = Variable(torch.ones(batch_size), requires_grad=True).cuda()
fake_labels = Variable(torch.zeros(batch_size), requires_grad=True).cuda()
################# TRAINING DISCRIMINATOR ######################
d_optimizer.zero_grad()
# Discriminator Losses on Real Images
outputs_d_real = dis(img_real, img_grey)
d_real_loss = criterion_d(outputs_d_real, real_labels).cuda()
# Discriminator Losses on Fake Images
z = Variable(torch.rand(args.batch_size, 1, 32, 32)).cuda()
outputs_g = gen(img_grey)
outputs_d_fake = dis(outputs_g, img_grey)
d_fake_loss = criterion_d(outputs_d_fake, fake_labels).cuda()
# Add up losses and update parameters
d_loss = (d_real_loss + d_fake_loss).cuda()
d_loss.backward()
d_optimizer.step()
################# TRAINING GENERATOR ######################
real_labels = Variable(torch.ones(batch_size), requires_grad=True).cuda()
fake_labels = Variable(torch.zeros(batch_size), requires_grad=True).cuda()
g_optimizer.zero_grad()
# Generator Losses on Fake Images
z = Variable(torch.rand(args.batch_size, 1, 32, 32)).cuda()
fake_images = gen(img_grey)
d_fake = dis(fake_images, img_grey)
g_loss = criterion_g(d_fake, real_labels).cuda()
g_loss.backward()
g_optimizer.step()
else:
img_grey = images
img_real = labels
batch_size = args.batch_size
real_labels = Variable(torch.ones(batch_size), )
fake_labels = Variable(torch.zeros(batch_size))
################# TRAINING DISCRIMINATOR ######################
d_optimizer.zero_grad()
# Discriminator Losses on Real Images
outputs_d_real = dis(img_real, img_grey)
d_real_loss = criterion_d(outputs_d_real, real_labels)
# Discriminator Losses on Fake Images
z = Variable(torch.rand(args.batch_size, 1, 32, 32))
outputs_g = gen(img_grey)
outputs_d_fake = dis(outputs_g, img_grey)
d_fake_loss = criterion_d(outputs_d_fake, fake_labels)
# Add up losses and update parameters
d_loss = (d_real_loss + d_fake_loss)
d_loss.backward()
d_optimizer.step()
################# TRAINING GENERATOR ######################
g_optimizer.zero_grad()
# Generator Losses on Fake Images
z = Variable(torch.rand(args.batch_size, 1, 32, 32))
fake_images = gen(img_grey)
d_fake = dis(fake_images, img_grey)
g_loss = criterion_g(d_fake, real_labels)
g_loss.backward()
g_optimizer.step()
# visual(images, labels, outputs_g, args.gpu, 1)
losses.append((d_loss.data.item(),g_loss.data.item()))
# print and visualize
print(epoch, g_loss.cpu().detach(), d_loss.cpu().detach())
visual(images, labels, fake_images, args.gpu, 1)
return losses
Train a conditional GAN for image colourization.
args = AttrDict()
args_dict = {
"gpu": True,
"valid": False,
"checkpoint": "",
"colours": "./data/colours/colour_kmeans24_cat7.npy",
"model_gen": "Generator",
"model_dis": "Discriminator",
"kernel": 3,
"num_filters": 32,
"num_colours_g": 3,
"num_colours_d": 1,
"num_in_channels_g": 1,
"num_in_channels_d": 4,
'learn_rate':0.0001,
"batch_size": 50,
"epochs": 25,
"seed": 0,
"plot": False,
"experiment_name": "colourization_cnn",
"visualize": False,
"downsize_input": False,
}
args.update(args_dict)
losses = train(args)
#batch size of 50 with 100 epochs seamed to work
fig, ax = plt.subplots()
losses = np.array(losses)
plt.plot(losses.T[0], label='Discriminator')
plt.plot(losses.T[1], label='Generator')
plt.title("Training Losses")
plt.legend()
How does the performance of the cGAN compare with the autoencoder models that you tested in the first half of this assignment?
Answer: It doesn't produce images that are as clear as the other models but acceptable. I also think that the process of building the other models is much easier to get similar results, so that's something to keep in mind.
A colour space is a choice of mapping of colours into three-dimensional coordinates. Some colours could be close together in one colour space, but further apart in others. The RGB colour space is probably the most familiar to you, the model used in in our regression colourization example computes squared error in RGB colour space. But, most state of the art colourization models do not use RGB colour space. How could using the RGB colour space be problematic? Your answer should relate how human perception of colour is different than the squared distance. You may use the Wikipedia article on colour space to help you answer the question.
Answer: RGB color space uses additive color mixing because it describes kind of light needed to be emitted to produce a given color, which is not similar to the human perception of color. Therefore it associates colors close together that the human perception doesn't think are necessarily similar. There are other color spaces such as CIE 1921 XYZ color space that are more closely related to the human color perception
At this point we have trained a few different generative models for our image colourization task with varying results. What makes this work exciting is that there many other approaches we could take. In this part of the assignment you will be exploring at least one of several approaches towards improving our performance on the image colourization task. Some well known approaches you can consider include:
Other interesting approaches include:
A great example of some of these different approaches can be found in a blog post by Moein Shariatnia.
Note you are only required to pick one of the suggested modifications.
I've chosen to apply L1 loss with the discriminator-based loss!
import torch
from torch import nn
class GANLoss(nn.Module):
def __init__(self, gan_mode='vanilla', real_label=1.0, fake_label=0.0):
super().__init__()
self.register_buffer('real_label', torch.tensor(real_label).cuda())
self.register_buffer('fake_label', torch.tensor(fake_label).cuda())
if gan_mode == 'vanilla':
self.loss = nn.BCELoss()
elif gan_mode == 'lsgan':
self.loss = nn.MSELoss()
def get_labels(self, preds, target_is_real):
if target_is_real:
labels = self.real_label
else:
labels = self.fake_label
return labels.expand_as(preds)
def __call__(self, preds, target_is_real):
labels = self.get_labels(preds, target_is_real)
loss = self.loss(preds, labels)
return loss
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__ = self
def get_torch_vars(xs, ys, gpu=False):
"""
Helper function to convert numpy arrays to pytorch tensors.
If GPU is used, move the tensors to GPU.
Args:
xs (float numpy tenosor): greyscale input
ys (int numpy tenosor): categorical labels
gpu (bool): whether to move pytorch tensor to GPU
Returns:
Variable(xs), Variable(ys)
"""
xs = torch.from_numpy(xs).float()
ys = torch.from_numpy(ys).float() #--> ADDED for cGAN
if gpu:
xs = xs.cuda()
ys = ys.cuda()
return Variable(xs), Variable(ys)
def train(args, gen=None, dis=None):
# Set the maximum number of threads to prevent crash in Teaching Labs
# TODO: necessary?
torch.set_num_threads(5)
# Numpy random seed
npr.seed(args.seed)
# Save directory
save_dir = "outputs/" + args.experiment_name
# LOAD THE COLOURS CATEGORIES
# INPUT CHANNEL
num_in_channels = 1 if not args.downsize_input else 3
# LOAD THE MODEL
if gen is None:
Net = globals()[args.model_gen]
gen = Net(args.kernel, args.num_filters,args.num_colours_g, args.num_in_channels_g)
if dis is None:
Net = globals()[args.model_dis]
dis = Net(args.kernel, args.num_filters,args.num_colours_d, args.num_in_channels_d)
# LOSS FUNCTION
criterion_d = GANLoss(gan_mode='vanilla')
criterion_g = GANLoss(gan_mode='vanilla')
# criterion_d = GANLoss()
# criterion_g = GANLoss()
g_optimizer = torch.optim.Adam(gen.parameters(), lr=args.learn_rate)
d_optimizer = torch.optim.Adam(dis.parameters(), lr=args.learn_rate/10)
# DATA
print("Loading data...")
(x_train, y_train), (x_test, y_test) = load_cifar10()
print("Transforming data...")
train_rgb, train_grey = process(x_train, y_train, downsize_input=args.downsize_input)
test_rgb, test_grey = process(x_test, y_test, downsize_input=args.downsize_input)
# Create the outputs folder if not created already
if not os.path.exists(save_dir):
os.makedirs(save_dir)
print("Beginning training ...")
if args.gpu:
gen.cuda()
dis.cuda()
start = time.time()
train_losses = []
valid_losses = []
valid_accs = []
for epoch in range(args.epochs):
# Train the Model
gen.train()
dis.train()
losses = []
for i, (xs, ys) in enumerate(get_batch(train_grey, train_rgb, args.batch_size)):
images, labels = get_torch_vars(xs, ys, args.gpu)
if args.gpu==True:
img_grey = images.cuda()
img_real = labels.cuda()
batch_size = args.batch_size
real_labels = Variable(torch.ones(batch_size), requires_grad=True).cuda()
fake_labels = Variable(torch.zeros(batch_size), requires_grad=True).cuda()
################# TRAINING DISCRIMINATOR ######################
d_optimizer.zero_grad()
# Discriminator Losses on Real Images
outputs_d_real = dis(img_real, img_grey)
d_real_loss = criterion_d(outputs_d_real, True).cuda()
# Discriminator Losses on Fake Images
z = Variable(torch.rand(args.batch_size, 1, 32, 32)).cuda()
outputs_g = gen(img_grey)
outputs_d_fake = dis(outputs_g, img_grey)
d_fake_loss = criterion_d(outputs_d_fake, False).cuda()
# Add up losses and update parameters
d_loss = (d_real_loss + d_fake_loss).cuda()
d_loss.backward()
d_optimizer.step()
################# TRAINING GENERATOR ######################
real_labels = Variable(torch.ones(batch_size), requires_grad=True).cuda()
fake_labels = Variable(torch.zeros(batch_size), requires_grad=True).cuda()
g_optimizer.zero_grad()
# Generator Losses on Fake Images
z = Variable(torch.rand(args.batch_size, 1, 32, 32)).cuda()
fake_images = gen(img_grey)
d_fake = dis(fake_images, img_grey)
g_loss = criterion_g(d_fake, True).cuda()
g_loss.backward()
g_optimizer.step()
else:
img_grey = images
img_real = labels
batch_size = args.batch_size
real_labels = Variable(torch.ones(batch_size), )
fake_labels = Variable(torch.zeros(batch_size))
################# TRAINING DISCRIMINATOR ######################
d_optimizer.zero_grad()
# Discriminator Losses on Real Images
outputs_d_real = dis(img_real, img_grey)
d_real_loss = criterion_d(outputs_d_real, True)
# Discriminator Losses on Fake Images
z = Variable(torch.rand(args.batch_size, 1, 32, 32))
outputs_g = gen(img_grey)
outputs_d_fake = dis(outputs_g, img_grey)
d_fake_loss = criterion_d(outputs_d_fake, False)
# Add up losses and update parameters
d_loss = (d_real_loss + d_fake_loss)
d_loss.backward()
d_optimizer.step()
################# TRAINING GENERATOR ######################
g_optimizer.zero_grad()
# Generator Losses on Fake Images
z = Variable(torch.rand(args.batch_size, 1, 32, 32))
fake_images = gen(img_grey)
d_fake = dis(fake_images, img_grey)
g_loss = criterion_g(d_fake, True)
g_loss.backward()
g_optimizer.step()
# visual(images, labels, outputs_g, args.gpu, 1)
losses.append((d_loss.data.item(),g_loss.data.item()))
# print and visualize
print(epoch, g_loss.cpu().detach(), d_loss.cpu().detach())
visual(images, labels, fake_images, args.gpu, 1)
return losses, gen, dis
args = AttrDict()
args_dict = {
"gpu": True,
"valid": False,
"checkpoint": "",
"colours": "./data/colours/colour_kmeans24_cat7.npy",
"model_gen": "Generator",
"model_dis": "Discriminator",
"kernel": 3,
"num_filters": 32,
"num_colours_g": 3,
"num_colours_d": 1,
"num_in_channels_g": 1,
"num_in_channels_d": 4,
'learn_rate':0.0001,
"batch_size": 50,
"epochs": 25,
"seed": 0,
"plot": False,
"experiment_name": "colourization_cnn",
"visualize": False,
"downsize_input": False,
}
args.update(args_dict)
losses, gen, dis = train(args)
#batch size of 50 with 100 epochs seamed to work
Retrieve sample pictures from online and demonstrate how well your best model performs. Provide all your code.
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sg
from PIL import Image
import requests
import os
import torch
import torchvision
from torchvision import datasets, models, transforms
# mount our Google Drive
from google.colab import drive
drive.mount('/content/drive')
# On colab, dump to local for faster load
!mkdir -p /new_images_horse
!cp -r '/content/drive/My Drive/Colab Notebooks/Horse Photos/' /new_images_horse/
dataset_path = '/new_images_horse/'
data_transform = transforms.Compose([transforms.Resize([32,32]), transforms.ToTensor()])
test_data = datasets.ImageFolder(dataset_path, transform=data_transform)
data_loader = torch.utils.data.DataLoader(test_data, batch_size=5, num_workers=0, shuffle=True)
from PIL import Image
dataiter = iter(data_loader)
images = dataiter.next()
images = images[0]
# image2 = images[1]
# image3 = images[2]
images_grey = torch.mean(images, axis=1, keepdim=True)
print(images_grey.shape)
print(images.shape)
def gen_test(batch_size, discriminator, generator, g_optimizer, criterion, real_images, labels):
g_optimizer.zero_grad()
# z = Variable(torch.randn(batch_size, 100)).cuda()
# fake_labels = Variable(torch.LongTensor(np.random.randint(0, 10, batch_size))).cuda()
fake_images = generator(labels)
validity = discriminator(fake_images, labels)
g_loss = criterion(validity, True).cuda()
return g_loss.item
def dis_test(batch_size, discriminator, generator, d_optimizer, criterion, real_images, labels):
d_optimizer.zero_grad()
# train with real images
real_validity = discriminator(real_images, labels)
real_loss = criterion(real_validity, True).cuda()
# train with fake images
# z = Variable(torch.randn(batch_size, 100)).cuda()
# fake_labels = Variable(torch.LongTensor(np.random.randint(0, 10, batch_size))).cuda()
fake_images = generator(labels)
print('Fake image shape ', fake_images.shape)
fake_validity = discriminator(fake_images, labels)
fake_loss = criterion(fake_validity, False).cuda()
d_loss = real_loss + fake_loss
return d_loss.item()
generator = gen.cuda()
discriminator = dis.cuda()
criterion = GANLoss(gan_mode='vanilla')
g_optimizer = torch.optim.Adam(gen.parameters(), lr=args.learn_rate)
d_optimizer = torch.optim.Adam(dis.parameters(), lr=args.learn_rate/10)
num_epochs= 30
images=images.cuda()
images_grey=images_grey.cuda()
# images_un = tf.unstack(images)
for epoch in range(num_epochs):
real_images = images
print(real_images.shape)
labels = images_grey
# generator.test()
batch_size = 5
d_loss = dis_test(batch_size, discriminator, generator, d_optimizer, criterion, real_images, labels)
g_loss = gen_test(batch_size, discriminator, generator, g_optimizer, criterion, real_images, labels)
fake_image = generator(labels)
visual(labels, real_images, fake_image, gpu=True)
Detailed instructions for saving to HTML can be found here. Provided below are a summary of the instructions:
(1) download your ipynb file by clicking on File->Download.ipynb
(2) reupload your file to the temporary Google Colab storage (you can access the temporary storage from the tab to the left)
(3) run the following:
%%shell
jupyter nbconvert --to html LAB_3_Generating_Data.ipynb
(4) the html file will be available for download in the temporary Google Colab storage
(5) review the html file and make sure all the results are visible before submitting your assignment to Quercus